StatAlign 2.0: combining statistical alignment with RNA secondary structure prediction
نویسندگان
چکیده
MOTIVATION Comparative modeling of RNA is known to be important for making accurate secondary structure predictions. RNA structure prediction tools such as PPfold or RNAalifold use an aligned set of sequences in predictions. Obtaining a multiple alignment from a set of sequences is quite a challenging problem itself, and the quality of the alignment can affect the quality of a prediction. By implementing RNA secondary structure prediction in a statistical alignment framework, and predicting structures from multiple alignment samples instead of a single fixed alignment, it may be possible to improve predictions. RESULTS We have extended the program StatAlign to make use of RNA-specific features, which include RNA secondary structure prediction from multiple alignments using either a thermodynamic approach (RNAalifold) or a Stochastic Context-Free Grammars (SCFGs) approach (PPfold). We also provide the user with scores relating to the quality of a secondary structure prediction, such as information entropy values for the combined space of secondary structures and sampled alignments, and a reliability score that predicts the expected number of correctly predicted base pairs. Finally, we have created RNA secondary structure visualization plugins and automated the process of setting up Markov Chain Monte Carlo runs for RNA alignments in StatAlign. AVAILABILITY AND IMPLEMENTATION The software is available from http://statalign.github.com/statalign/.
منابع مشابه
Novel representation of RNA secondary structure used to improve prediction algorithms.
We propose a novel representation of RNA secondary structure for a quick comparison of different structures. Secondary structure was viewed as a set of stems and each stem was represented by two values according to its position. Using this representation, we improved the comparative sequence analysis method results and the minimum free-energy model. In the comparative sequence analysis met...
متن کاملA folding algorithm for extended RNA secondary structures
MOTIVATION RNA secondary structure contains many non-canonical base pairs of different pair families. Successful prediction of these structural features leads to improved secondary structures with applications in tertiary structure prediction and simultaneous folding and alignment. RESULTS We present a theoretical model capturing both RNA pair families and extended secondary structure motifs ...
متن کاملAnnual Poster Presentation on November 21 , 2008
Codon usage bias refers to differences among organisms in the frequency of occurrence of codons in protein-coding DNA sequences. This bias in codon preference has been reported in most genomes that have been studied so far. In some organisms, highly expressed genes have a strong codon preference that is consistent with the concentrations of corresponding tRNAs, whereas genes expressed at a lowe...
متن کاملDAFS: simultaneous aligning and folding of RNA sequences via dual decomposition
MOTIVATION It is well known that the accuracy of RNA secondary structure prediction from a single sequence is limited, and thus a comparative approach that predicts a common secondary structure from aligned sequences is a better choice if homologous sequences with reliable alignments are available. However, correct secondary structure information is needed to produce reliable alignments of RNA ...
متن کاملSimultaneous alignment and structure prediction of three RNA sequences
Comparative RNA sequence analyses have contributed remarkably accurate predictions. The recent determination of the 30S and 50S ribosomal subunits bringing more supporting evidence. Several inference tools are combining free energy minimisation and comparative analysis to improve the quality of secondary structure predictions. This paper investigates the following hypotheses: the use of three i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 29 5 شماره
صفحات -
تاریخ انتشار 2013